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Abstract 

We present two novel approaches for the computation of the exact 
distribution of a pattern in a long sequence. Both approaches take into 
account the sparse structure of the problem and are two-part algorithms. 
The first approach relies on a partial recursion after a fast computation of 
the second largest eigenvalue of the transition matrix of a Markov chain 
embedding. The second approach uses fast Taylor expansions of an exact 
bivariate rational reconstruction of the distribution. 

We illustrate the interest of both approaches on a simple toy-example 
and two biological applications: the transcription factors of the Human 
Chromosome 5 and the PROSITE signatures of functional motifs in pro- 
teins. On these example our methods demonstrate their complementarity 
and their hability to extend the domain of feasibility for exact computa- 
tions in pattern problems to a new level. 

1 Introduction 

The distribution of patterns in state random sequences generated by a Markov 
source has many applications in a wide range of fields including (but not limited 
to) reliability, insurance, communication systems, pattern matching, or bioinfor- 
matics. In the latter field in particular, the detection of statistically exceptional 
DNA or protein patterns (PROSITE signatures, CHI motifs, regulation pat- 
terns, binding sites, etc.) have been very successful allowing both to confirm 
known biological signals as well as new discoveries. Here follows a short selection 
of such work: Karlin et al. [1992], van Helden et al. [1998], Brazma et al. [1998], 
El Karoui et al. [1999], Beaudoing ct al. [2000], Frith et al. [2002], Hampson 
et al. [2002], Leonardo Mariho-Ramirez and Landsman [2004]. 

From the technical point of view, obtaining the distribution of a pattern 
count in a state random sequence is a difficult problem which has drawn a 
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considerable interest from the probabilistic, combinatorics and computer science 
community over the last fifty years. Many concurrent approaches have been 
suggested, all of them having their own strengths and weaknesses, and we give 
here only a short selection of the corresponding references [see Rcignier, 2000, 
Lothaire, 2005, Nuel, 2006b, for more comprehensive reviews]. 

Exact methods are based on a wide range of techniques like Markov chain 
embedding, probability generating functions, combinatorial methods, or expo- 
nential families Fu [1996], Stefanov and Pakcs [1997], Antzoulakos [2001], Chang 
[2005], Boeva et al. [2005], Nuel [2006a], Stefanov and Szpankowski [2007], Boeva 
et al. [2007]. There is also a wide range of asymptotic approximations, the most 
popular among them being: Gaussian approximations Pcvzner et al. [1989], 
Cowan [1991], Kleffe and Borodovski [1997], Prum et al. [1995], Poisson ap- 
proximations Godbolc [1991], Gcske et al. [1995], Reincrt and Schbath [1999], 
Erhardsson [2000] and Large deviation approximations Dcnisc ct al. [2001], Nuel 
[2004]. 

More recently, several authors [Nicodcme et al., 2002, Crochemore and Ste- 
fanov, 2003, Lladscr, 2007, Nuel, 2008, Ribeca and Raineri, 2008] pointed out 
the strong connection of the problem to the theory of pattern matching by pro- 
viding the optimal Markov chain embedding of any pattern problem through 
minimal Deterministic Finite state Automata (DFA). However, this approach 
remains difficult to apply in practice when considering high complexity patterns 
and/or long sequences. 

In this paper, we want to address this problem by suggesting two efficient 
ways to obtain the distribution of any pattern of interest in a (possibly long) 
homogeneous state Markov sequence. 

The paper is organized as follow. In the first part, we recall the principles of 
optimal Markov chain embedding through DFA, as well as the associated prob- 
ability generating function (pgf) formulas. We then present a new algorithm 
using partial recursion formulas. The convergence of these partial recursion for- 
mulas depends on a (fast) precomputation of the second largest eigenvalue of the 
transition matrix of a Markov chain embedding. The next part takes advantage 
of state-of-the-art results in exact computation to suggest a very efficient way to 
obtain the bivariate pgf of the problem through rational reconstruction. Once 
this involving precomputation has been performed, fast Taylor expansions, using 
the high-order liftings of Storjohann [2003], can very quickly reveal the distribu- 
tion of any pattern count. We then apply our new algorithms successively to 
a simple toy-example, a selection of DNA (Transcription Factors) patterns, and 
to protein motifs (PROSITE signature). In all cases, the relative performance 
of the two algorithms are presented and discussed, highlighting their strengths 
and weaknesses. We conclude with some perspectives of this work. 
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2 DFA and optimal Markov chain embedding 



2.1 Automata and languages 

In this part we recall some classical definitions and results of the well known 
theory of languages and automata [Hopcroft et al., 2001]. 

We consider A a finite alphabet whose elements are called letters. A word 
(or sequence) over ^ is a sequence of letters and a language over ^ is a set of 
words (finite or not). We denote by e the empty word. For example ABBABA is 
a word over the binary alphabet A = {A, B} and C = {AB, ABBABA, BBBBB} is a 
(finite) language over A. 

The product £i • £2 (the dot could be omitted) of two languages is the lan- 
guage {wiW2,Wi £ Ci,W2 S C2} where W1W2 is the concatenation - or product 
- of wi and W2. If £ is a language, £" — {wi . . .Wn with wi,...Wn G C} 
and the star closure of C is defined by C* = U„^o£"- The language A* 
is hence the set of all possible words over A. For example we have {AB} • 
{ABBABA, BBBBB} = {ABABBABA, ABBBBBB}; {AB}^ {ABABAB} and {AB}* = 
{e, AB, ABAB, ABABAB, ABABABAB . . .}. 

A regular language is either the empty word, or a single letter, or obtained by 
a finite number of regular operations (namely: union, product and star closure). 
A finite sequence of regular operations describing a regular language is called 
a regular expression whose size is defined as its number of operations. A* is 
regular. Any finite language is regular. 

Definition 1. If is a finite alphabet, Q is a finite set of states, a £ Q is 
a starting state, J- C Q is a subset of final states and S : Q x A ^ Q is 
a transition function, then {A, Q, a, S) is a Deterministic Finite Automaton 
(DFA). For all af — ai . . . Od-iOd £ A"^ {d 2) and q £ Q we recursively 
define S{q,af) = S{S{q,af-^),ad). A word w £ A''' is accepted (or recognized) 
by the DFA if S{a, w) £ T . The set of all words accepted by a DFA is called its 
language. See in Figure 1 a graphical representation of a DFA. 

We can now give the most important result of this part which is a simple 
application of the classical Kleene and Rabin & Scott theorems [Hopcroft ct al., 
2001]: 

Theorem 2. For any regular language C there exists a unique (up to a unique 
isomorphism) smallest DFA whose language is £. If we denote by E the size of 
the regular expression corresponding to £, then the size R of the smallest DFA 
is bounded by 2^ . 

For certain specific patterns (e.g. A^w where w is a simple word), a minimal 
DFA can be built directly using ad hoc construction or well-known algorithms 
[e.g. Aho-Corasick algorithm. Alio and Corasick, 1975]. In general however, 
building a minimal DFA from a regular expression usually requires three steps: 

1) turning the regular expression into a Nondeterministic Finite Automaton - 
NFA - [Thompson's or Glushkov's algorithm, AUauzcn and Mohri, 2006]; 
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Figure 1: Graphical representation of the DFA {A, Q,a,J-,S) with A = {A,B}, 
Q = {0, 1, 2, . . . , 10, 11}, a = 0,T = {11} and (5(0, A) = 1, (5(0, B) = 0, S{1, A) = 
1, (5(1,8) = 2, (5(2, A) = 3, ,5(2, B) - 4, 5(3, A) = 5, (5(3, B) - 1, 5(4, A) = 5, 
(5(4, B) = 0, (5(5, A) = 6, (5(5, B) = 2, (5(6, A) = 7, 5(6, B) = 8, 5(7, A) = 9, 
5(7, B) = 2, 5(8, A) = 10, 5(8, B) = 4, 5(9, A) = 1, 5(9, B) = 11, 5(10, A) = 5, 
5(10, B) = 11, 5(11, A) = 3 and 5(11, B) = 4. This DFA is the smallest one that 
recognize the language £ = A*Wi with A = {A,B}, Wi = AByl^AAyliAB and 
hence |>Vi| =4. 

2) producing a DFA from the NFA [determinization; Subset construction, see 
Hopcroft et al., 2001, Section 2.3.5]; 

3) minimizing the DFA by removing unnecessary states [for minimization; Hopcroft 's 
algorithm, see Hopcroft, 1971, Hopcroft et al., 2001, Section 4.4.3]. 

As stated in Theorem 2, the smallest DFA may have a total of 2^ states in 
the worse case. However, this upper bound is seldom reached in practice. This 
may be observed in Table 1 where the practical value of R is far below the upper 
bound. 

One should note that the complexity R of real-life patterns is quite different 
from one problem to another. For example in Nuel et al. [2010], the authors 
consider a total of 1, 276 protein signatures from the PROSITE database which 
complexities range from R = 22 (RGD motif) to i? = 837, 507 (APPLE motif, 
PS00495), with a mode around R = 100. 

2.2 Connection with patterns 

We call pattern (or motif) over the finite alphabet A any regular language (finite 
or not) over the same alphabet. For any pattern W any DFA that recognizes the 
regular language A*yV is said to be associated with W. According to Theorem 2, 
there exists a unique (up to unique isomorphism) smallest DFA associated with 
a given pattern. For example, if we work with the binary alphabet A = {A, B}, 
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\m 
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16 


64 256 1 024 


4 096 


16 384 65 536 


262 144 


1 048 576 


2E 


512 


2 048 


8192 32 768 1.3 x 10^ 


5.2 X 10^ 


2.1 X 10** 8.4 X 10^ 


3.4 X 10'' 


1.3 X 10^ 


R 


12 


27 


57 122 262 


562 


1207 2 592 


5 567 


11957 


F 


1 


3 


6 13 28 


60 


129 277 


595 


1278 



Table 1: Characteristics of the smallest DFA that recognizes the language C = 
A*Wk with A = {A,B} and Wk = ABy^'^AAyl'^AB. The pattern cardinality is 
\Wk\ = 2'^ X 2'^ — A^, R is the total number of states, F the number of final 
states, and 2^ = 2^+^*^ is the theoretical upper bound of R. 




Figure 2: Graphical representation of the smallest DFA associated with W2 = 
ABA'^kkA^AB with A = {A,B}. This DFA has R = 27 states including F = 3 
final states. 

then the smallest DFA associated with Pattern Wi = AB^^AA^^AB has R = 12 
states and F = 1 final state (see Figure 1), and the smallest DFA associated 
with Pattern >V2 = AByl^AAyl^AB has R = 27 states and F = 3 final states (see 
Figure 2). 

It is well known from the pattern matching theory [Gormen et al., 1990, 
Grochemore and Hancart, 1997] that such a DFA provides a simple way to find 
all occurrences of the corresponding pattern in a sequence. 

Proposition 3. Let W be a pattern over the finite alphabet A and {A, Q, 
(T, J^, 6) be a DFA that is associated to W. For all deterministic sequence 
x{ = X1X2 ■ ■ - Xi over A^ we recursively define the sequence j/q — j/oJ/i ■ ■ - Vi over 
Qhy — a and yi — 5{yi-i, Xi). For all 1 ^ i ^ ^ we then have the following 
property 1; x\ ^ A*W y, G T. 

Proof. Since the DFA is associated to W, x\ e A*W is equivalent to 5{a,x\) S 

a A*W means that an occurrence of W ends in position i in x^. 
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T. One can then easily show by induction that b(a^x\) — yi and the proof is 
achieved. □ 



Example 4. Let us consider the pattern Wi — AByl^AAyl^AB over the binary 
alphabet A = {A,B}. Its smallest associated DFA is represented in Figure 1. If 
xf^ = ABAAABBAAAABBAABABAB is a binary sequence, we build the sequence y^^ 
according to Proposition 3 and we get: 

2:f=-ABAAABBAAAA B BAAB A B AB 
yf = 012356845679 11 4568 10 11 32 

where final states arc in bold. We hence see two occurrences of Wi : one ending 
in position 12 (ABBAAAAB) and one in position 18 (ABBAABAB, which overlaps the 
previous occurrence). 

If this approach may be useful to localize pattern occurrences in deterministic 
sequences, one should note that NFA (Nondeterministic Finite Automata) are 
usually preferred over DFA for such a task since they are far more memory 
efficient and can achieve similar speed thanks to lazy determinization [Le Maout, 
2011]. The DFA-based approach however has a great advantage when we work 
with random sequences. 

2.3 Markov chain embedding 

Let A"^ be an homogeneous'^ an order m ^ 1 Markov chain over A whose 
starting distribution and transition matrix are given for all (a, 6) e y. Ahy 



H{a) =^ P(A{" = a) and 7r(a, &) = P(A, = h\Xizl, = a). Let now W be a 



regular expression over A, our aim is to obtain the distribution of the random 
number of occurrences of W in Af defined'^ by: 



where If is the indicatrix function of event £ (the function takes the value 1 is 
£ is true, else). 

Let {A, Q, (J, S) be a minimal DFA associated to W. We additionally 
assume that this automaton is non m-ambiguous [a DFA having this property 
is also called an m-th order DFA in Lladser, 2007] which means that for all 

q € Q, S~'^(p) =^ {a™ e A^, 3p G Q,d {p, a™) = q} is either a singleton, or the 
empty set. When the notation is not ambiguous, 6~"^{p) may also denote its 
unique element (singleton case). We then have the following result: 

^Please note that Theorem 5 can be easily generalized to heterogeneous Markov chains, 
but we focus here on the simpler case since our computational approaches are only valid for 
homogeneous Markov chains. 

•^For simplification purpose, we deliberately ignore possible occurrences of W in XJ". 
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(1) 



i—m-\-l 
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def 

Theorem 5. We consider the random sequence over Q defined by Fq = cr 
and Y, = S{Y,_i,X,), Vi, 1 ^ i ^ 1 Then (Y^) 

i>m is a homogeneous order 
1 Markov chain over Q' = 6{a,A"'A*) such as, for all p,q G Q' and 1 < 
i ^ £ — m, the starting vector Up =^ P = p) and the transition matrix 
Tp,5 =^ P (l^j+m = q\Yi+^_i = p) are given by: 



otherwise 



(2) 



~ \ otherwise ^'^> 
and we have the following property: 

Xi e A*W ^ Y,eT. (4) 

Proof. The result is immediate considering the properties of the DFA and Propo- 
sition 3. See Lladscr [2007] or Nuel [2008] for more details. □ 

From now on, we consider the decomposition T = P + Q where for all 
P, 9 G Q' we have: 



Tp,, ifq^T _ r ifg^^ 

p^i-^ n ifqeJ^ ^P'^ ^ { Tp,, if g e ^ • 

We finally introduce the dimension R |Q'| column- vector v (1, . . . , 1)^, 
where ^ denote the transpose symbol. 

Corollary 6. With the same hypothesis and notations as in Theorem 5, the 
probability generating function (pgf ) of Ng is then explicitly given by: 



G,(y) = J2 = n)y" = u(P + yQ)^-"v (6) 



G(2/, z) = J2 E W = n)y"^' = E ^i(y>' = u(I-zP-yzQ)- Vz™ (7) 



and we also have: 

dcf 

where I denotes the identity matrix. 



Proof. It is clear that uT*" gives the marginal distribution of Ye. If we now intro- 
duce the dummy variable y to keep track of the number of pattern occurrences, 
then u(P + yQ,Y gives the joint distribution of (Yi, Ng). Marginalizing over Yi 
through the product with v hence achieves the proof of Equation (6). Equa- 
tion (7) is then obtained simply by exploiting the relation J2k>o '^'^ ^ (I— M)^^ 
withM = z{P-h2/Q). " □ 
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Example 7. Considering the same pattern and associated DFA as in Example 
4, one can directly apply Theorem 5 to get the expression of T, the transition 
matrix ofY^ over Q = {0,1,2,3,4,5,6,7,8,9,10,11}: 



T = 



7I'B,B 
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where Tra,b = ^{^2 = b\Xi = a) for all a,b £ {A, B}. 



3 Partial recursion 

We want here to focus directly on the expression of Gg{y) in Equation (6) rather 
than exploiting the bivariate expression G{y,z) of Equation (7). A straightfor- 
ward approach consists in computing recursively u(P + yQY (or conversely 
(P + ?/Q)V) up to i — £ — m taking full advantage of the sparse structure of 
matrices P and Q at each step. This is typically what is suggested in Nucl 
[2006a]. The resulting complexity to compute V{Ng — n) is then 0(0 x n x £) 
in time and 0(f2 x n) in space, where = R x \A\ is the number of nonzero 
elements in P + Q. This straightforward (but effective) approach is easy to 
implement and is from now on referred as the "full recursion" Algorithm. 

Another appealing idea is to compute directly the matrix (P + j/Q)^""* us- 
ing a classical dyadic decomposition of £ — m. This is typically the approach 
suggested by Ribeca and Raincri [2008]. The resulting complexity to obtain 
F{N( = n) is 0{R'^ x x \og£) in time and 0{R^ x n x \og£) in space. The 
algorithm can be further refined by using FFT-polynomial products (and a very 
careful implementation) in order to replace the quadratic complexity in n by 
O(nlogn). The resulting algorithm however suffers from numerical instabili- 
ties when considering the tail distribution events and is therefore not suitable 
for computing extreme p- values. If this approach might offer interesting perfor- 
mance for relatively small values of R and n, its main drawback is that it totally 
ignore the sparse structure of the matrices and therefore fail to deal with highly 
complex patterns (large R). 

Here we want to introduce another approach that fully takes advantage of 
the sparsity of the problem to display a linear complexity with R (as with 
full recursion) but also dramatically reduces the complexity with the sequence 
length £ in order to be suitable for large scale problems. 
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From now on, let us assume that P is an irreducible and aperiodic positive 
matrix and we denote by A its largest eigenvalue (we know that A > thanks 

to Perron- Frobenius). Let us define P =^ P/A and Q =^ Q/A. 

For all i ^ and all fc ^ we consider the dimension R column-vector 
Fk{i) = [y'^KP + yQ)V. By convention, Fk{i) = if i < 0. It is then possible 
to derive from Equation (6) that V{Ni = k) = [y'']F{y) = A^~"uFfc(£- m). 
Additionally, we recursively define the dimension R column- vector D*^(i) for all 

fc,z, j ^ by DO.(z) = Ffe(i) and, if z ^ 1 and j ^ 1, Blr\i) ~ 

- 1) so that ^ ELoi'^Y i^)Fk{t - 5). 

Lemma 8. For all j ^ 0, fc ^ 1, and i ^ j we have: 

T>i{^ + l)^Pnii^) and Di(i + 1) = PD^W + QDi_i(z). (8) 

Proof. The results for j = come directly from the definition of D^(i) — Fk{i), 
the rest of the proof is achieved by induction. □ 

From now on, all asymptotic developments in the current section are sup- 
posed to be valid for i +oo. 

Lemma 9. For all fc ^ 0, it exists a dimension R column- vector C^: such as: 

D^:(i)-Cfc + 0(^.^/('=+i)) (9) 

where v denotes the magnitude of the second eigenvalue of P when we order 
the eigenvalues by decreasing magnitude. 

Proof. It is clear that DQ(i) — P*v, elementary linear algebra hence proves the 
lemma for fc = with Co — P°°v. We assume now that Equation (9) is proved 
up to a fixed fc — 1 ^ 0. A recursive application of Lemma 8 gives for all a ^ fc 
andi^^O that D*;(i -I- a) = P'D^(a) -I- ^2]=! 'P'^^Q^t-iU - 1 + a)- Thanks 
to Equation (9) it is clear that the second term of this sum is a 0(j/"/'=), and 
we then have D^(i + a) = P°°D^(a) + 0{iy') + 0{v°'/^). Therefore we have 
D^+^z + a)^ 0{v') + C>(z/"/'=). For any j ^ fc -f 1, if we set a = j^, then 

we obtain D^+^(j) = 0{v^/^^^'^^), which finishes the proof. □ 



Proposition 10. For all j ^ 0, a ^ and z ^ j + a we have: 
k ,. 

j'=j ^ J / \ J 

and in particular for j = we have: 

= E f ^ " , ' ^ D{ (a + ./) + - O (."/(-^)) . (11) 
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Input: The matrices P, Q, the vectors u, v, n+1 column-vectors Ao, Ai, 
An of dimension R, n + 1 real numbers Ro,Ri, ■ ■ ■ , Rn, and a real number C 
II Compute A through the power method 
Ao ^ V, A 1 

while A has not converged with relative precision e do 
A <— P Ao / Aq (point-wise division) and Ao <— P Ao 
// Normalize P and Q 
P ^ P/A and Q Q/A 

// Compute a such as Cn = D^(a) 
Ao V 

for i = 1 . . . n do 
for k = i . . .1 do 

Afe PAfe + QAfe_i — Afe / / so that Ak now contains D^.(i) 
Ao <— PAo — Ao 1 1 so that Ao now contains Do(i) 
for i = n + 1 . . .£ — m do 
for k = n . . . 1 do 

Ak PAfc + QAfc_i // so that Ak now contains D^(i) 
Ao PAo 1 1 so that Ao reow; contains Dg (i) 
if A„ as converged towards C„ with relative precision r; then 
a •«— i and break 
// Compute all D°(a — n) for ^ k ^ n 
Ao V 

for i = 1 . . . a — n do 
for A; = n ... 1 do 

Afc •«— PAii -|- QAfc_i 1 1 so that Ak now contains D°(i) 
Ao <— P A() / / so that Aq now contains Dq (i) 

// Compute Rk = Pk,i-m{c»- — n + k) for all ^ k ^ n 
for fc = . . . n do 

Rk ^ uAfc 
C ^ 1.0 

for k = 1 . . .n do 
for J = n . . . 1 do 

Aj PAj -I- QAj_i — PAj 1 1 so that Aj now contains T>j{a — n + k) 
Ao <— PA() — A() 1 1 so that Ao now contains Do(a — n + k) 
C^Cx(£-d-a + n-k+ l)/fe 
for J = fe . . . n do 

Rk Rk + CuAfe 
Output: return P(Ar<i = k) = Rk for all ^ ft ^ n 

Algorithm 1: Compute P(A''^ = fc) with relative error r/ for all < fc < n using 
floating point arithmetic with a relative error e (< r/). 
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Proof. This is proved by induction, using repetitively Lemma 9 and the fact 
that T>i{i) = T>i{a + j) + D^^+^(a +,] + !) + ...+ D^fc+^(i). □ 

Corollary 11. For any a ^ 0, for any k ^ and £ — we have: 

k 

V{Ni = k)^ A^~™ 

Pk.l^rn.(a) 




and Pk,t-m{o!) approximates V{Ni — k) with a relative error of: 

V{Ni = k) 

These results lead to Algorithm 1 which allows to compute P(iV£ = k) for 
all ^ A; ^ n for a given ti ^ 0. The workspace complexity of this algo- 
rithm is 0(n X R) and since all matrix vector products exploit the sparse 
structure of the matrices, the time complexity is 0{a x n x |^| x i?) with 
a = 0{n^ log(£)/log(iy-i)) 

where v is the magnitude of the second largest eigenvalue. 

4 High-order liftings 

Another appealing idea consists of using Equation (7) to compute explicitly the 
rational function G{y, z) and then performing (fast) Taylor expansions, first in 
z, then in y in order to get the appropriate values of V{Ni — n) [Nicodeme 
et al., 2002, Lladser, 2007]. 

There, a really naive approach would solve the bivariate polynomial system 
over the rationals. In the solution, the polynomials would be of degree at most 
R in each variable and the rationals of size at most R\og2{R), thus yielding 
a binary cost of 0{R^logl{R)) already for the computation of G{y,z) and a 
further cost of the same magnitude for the extraction of the coefficient of degree 
£ in the development with e.g. linear system solving. 

Since this resulting complexity is usually prohibitive, we first use modular 
methods and Chinese remaindering to compute with polynomials and rationals 
so that the cost of the arithmetic remains linear. Also to take advantage of 
the sparsity we do not invert the matrix but rather compute directly a rational 
reconstruction of the solution from the first iterates of the series. Equation 
(6). We thus only use sparse matrix- vector products and do not fill the matrix. 
Note that this is equivalent to solving the system using iterative 2;-adic methods. 
Finally, we compute only small chunks of the development of Equation (7) using 
the "high-order" lifting of Storjohann [2003], or the method of Fiduccia [1985], 



/ i — m - a - k + j'\ ... 
( MuDi(a + /) 



¥{Ni, = k) 



m - 
k 



O 



(^jjC./(k + l)\^ 
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modulo y"^^. We aim to compute only all the coefRcients gi^n — ^{Ni = n) 
for a given i and a given n G [n,v], for an interval [/i, z^] with a small v, but 
where ^ is potentially large. Thus we want to avoid computing the whole Taylor 
development of the fraction. 

Indeed, let G{y,z) — with B,A€: Q[y,z] of degrees cIaz = deg(^, z) 

and dsz = deg(i?, z) < dAz — 1- Overall let us denote by d a bound on dAz and 
thus on dsz- We can assume that A and B are polynomials since we can always 
pre- multiply them both by the lowest common multiple of their denominators. 

Thus B — J2i=o ^i(y)^^ and if we denote by [z']P(z), the z-th coefficient of 
the polynomial P: 

[/]Giy,z)Jj2bMxW-'] (^) 

Now, the idea is that for a given coefficient £, the only coefficients of the 
development oi 1/A that are needed are the coefficients of order ^ — m to ^, 
denoted by [A^^Y^^^^. This is precisely what Storjohann's "High-order" lifting 
can compute quickly. 

We will use several Chinese remaindering and rational reconstructions. 

In our cases, we have a series 5*-^* which is actually equal to a fraction 
by definition. Therefore, provided that we consider sufficiently many of the 
first coefficients in the development, the rational reconstruction of the truncated 
series Yl^"^ giz"^, even with rational polynomial in y as coefficients, wi/Z eventually 
yield A{z) and B{z) as solutions. 

4.1 Rational reconstruction 

The first step is thus to recover both polynomials B and A from the series 
development of Equation (7). Now, one could compute the whole rational re- 
construction of a polynomial over the domain of the rationals, then using x 
operations for the domain arithmetic, which would yield a d^ complexity to com- 
pute all the d^ coefficients. We rather use two modular projections, one for the 
rational coefficients, one for the polynomials in y, in order to reduce the cost of 
the arithmetic to only d^. Then the overall cost will be dominated by the cost of 
the computation of only d coefficients of the series, as shown in proposition 12. 
Our algorithm has then four main steps: 

1) We take the series in z modulo some prime numbers (below 2'^ where 7 is 
e.g. close to word size) and some evaluation points on the variable y; 

2) We perform a univariate polynomial fraction reconstruction in z over each 
prime field and for each evaluation point; 

3) We interpolate the polynomials in y over the prime field from their evalua- 
tions; 
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4) We finally Chinese remainder and rational reconstruct the rational coeffi- 
cients from their modular values. 

The details of the approach are given in Algorithm 2 whose complexity is 
given by the following proposition. 



Input: The matrices P, Q and the row and column vectors u, v defining; 
G{y,z). 

Output: Polynomials B{y, z) and A{y, z) of degree < d with G(y, z) = 

1: Let G{y,z) = uv, wo(y) = v; 
2: for n = 1 to 2c? do 
3: u„(y) = (P + ?/Q)w„„i(?;); 
4: G(y,z) = G(y,z) + U'i;„(y)z"; 

5: Let d= (2d + 2)log2.(||P,Q,u,v||oo); 
6: for i = to d do 

7: Let Pi > 2'' > d he & prime, coprime with po, ■ • • 7^1-1; 
8: Gi{y,z) ^ G{y,z) mod p^; 
9: for i — Q to d do 

10: Let yj be an element modulo pi, distinct from j/Oi • • ■ i J/j-ii 
11: Gij{z) ^ Gi{yj,z) modp,;; 

12: ^^^^i^ =FractionReconstruction(Gi.j(z)) mod mod {y ~ yj); 

II Bij{z) = Yfu^^Bij^riz"^ and Aij{z) = I]«=o 
13: for 71 = to d do 

14: Interpolate Bi n{y) mod pi from Bi j ^ for j ~ Q..d] 
15: Interpolate Ai^n{y) mod pi from Aij,n for j = 0..d; 

II Bi{y,z) = E5!=o-^^n(y)^" andA,{y,z) = E^^=o ^^"(y)^" 
16: for n = to (i do 
17: for / = to d do 

18: ChineseRemainder [y^z'']B{y, z) from [y"z'']Bi{y, z) for i — Q..d; 
19: ChineseRemainder [y"^ z'']A{y , z) from [y'^z^]Ai{y, z) for i = 0..d; 
20: RationalReconstruct both obtained coefficients; 

Algorithm 2: Modular Bivariate fraction reconstruction over the rationals. 



Proposition 12. Let d — maxjd^, ds} be the degree in z and i' be the largest 
degree in y of the coefficients of A and B. Let = |.4|ii! be the total number 
of nonzero coefficients of the Rx R matrices P and Q. If the coefficients of the 
matrices P, Q, u and v, and |^| are constants, then the cost of the computation 
of B and A in Algorithm 2 is 

O {d^Rlog{R)) , 

where the intermediate memory requirements are of the same order of magni- 
tude. 
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Proof. Polynomial fraction reconstruction of degree d requires 2d coefficients. 
The computation of one coefficient of the evaluated series modulo costs one 
matrix-vector product, ft word operations, and a dot product of size R < fl. 
By definition deg{gj{y)) = deg((P + yQ)-') < j, thus u <2d and, similarly, the 
size of the rational coefficients is bounded by {2d + 2) log(i?||P, Q, u, v||^). 
Thus steps 3 and 4 in Algorithm 2 cost 

d 

J2 0{nn' log{R\ |P, Q, u, v| 1^) = 0{d^n log(i?||P, Q, u, v||^) 
operations. 

Then a fraction reconstruction of degree d costs less than 0{d^) operations 
by Berlekamp-Massey or extended Euclidean algorithm and an interpolation of 
d points costs less than 0{d^) operations so that, overall, 

steps 12, 14 and 15 cost less than 0{d'^) operations. Then Chinese remain- 
dering and rational reconstruction of size d costs less than 0{d^) for an overall 
cost of 

0(d4log2(||P,Q,u,v|U). 

As d < R < Q, if logjT (||P, Q, u, v||oo) — 0(1), then this latter term is domi- 
nated by 0{d^n\og{R)). Finally, if |.4| is constant, we have that = 0{R). 

Now, the memory requirements are bounded by those of the series. The 
vector Vn{y) is of size i?, of degree n in y with rational coefficients of size 
nlog(i?||P,Q,u,v||^). Thus Vn{y) and the dot product uw„(y) are of size 
0{Rn'^\og{R)) so that G{y,z) = E^loUw„(y) is 0{R\og{R)d^). □ 

Thus the dominant computation in Algorithm 2 is the computation of the 
first terms of the series G(y, z). 

4.2 Early termination strategy for the determination of 
the fraction degrees 

There remains to determine the value of the degree d. As the series is the 
solution of a polynomial linear system of size R and degree 1, the determinant, 
and thus the denominator and numerator of the solution are of degree bounded 
by R. Now in practice, we will see that very often this degree is much smaller 
than R. As the complexity is cubic in d it is of major importance to determine 
as accurately as possible this d beforehand. 

The strategy we propose is an early termination, probabilistic of Las Vegas 
type, i.e. it is always correct, but sometimes slow. We reconstruct the rational 
fraction only from a small number do of iterations, and then again after do + 1 
iterations. If both fractions are identical with numerator and denominator of 
degrees strictly less than do, then there is a good chance that the recovered 
fraction is the actual one. This can be checked by just applying the matrix A 
to the obtained guess and verifying that it corresponds to the right-hand side. 

If the fractions disagree or if the check fails, one can then try again after 
some more iterations. 
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In our bivariate case over the rationale, we have a very fast strategy which 
consists in finding first the degree at a single evaluation point modulo a single 
prime and e.g. roughly doubling the number of iterations at each failure. This 
search thus requires less than 2 x 2d iterations and has then a marginal cost of 
0{dn + d^). 



4.3 High-order lifting for polynomial fraction development 

Once the bivariate fraction is recovered, the next step is to compute the 
coefficients of degree £ S [a,f3] of its series development. The idea of the high- 
order lifting of Storjohann [2003] is to make use of some particular points in the 
development, that are computable independently. Then these points will enable 
fast computation of only high-order terms of the development and not of the 
whole series. In the following, we call these points residues. 
We first need the fundamental Lemma 13. 

Lemma 13. Let A,B E M.[z] be of respective degrees dA and ds < dA — 1- Then 
for all £ G N, there exists Bi e M.[z] of degree < dA — l and {gi)i=Q..e-i E 
such that 

A{z) Aiz) 

Proof. We use the same construction as Salvy [2009]: the initial series rewrites 

as, f = ESo 9iz" = T.lZo gtz'+z^ "ZZo ■ Then let Bi = B-A (E^o 9iz" 

By construction, degree{Bi) = dA+i—^, but we also have that i?£ ~ z^ X^i^o 9i+i^^ 
We thus let Bg = Biz~^ which satisfies the hypothesis. □ 

The question is how to compute efficiently the £th residue Bi defined in 
Lemma 13. The idea is to use the high-order lifting of Storjohann [2003]. 

We follow the presentation of the lifting of Salvy [2009] but define a slightly 
different bracket notation for chunks of a polynomial: 

[A{z)]i^a^z'- + ...+apzP (12) 

Roughly there are two main parts. The first one generalizes the construction 
of Lemma 13 using only 2c? coefficients of A~'^ . The second part builds small 
chunks of size 2d of A~^ at high-orders, each being close to a power of 2. 

The efficient computation of residues given in Algorithm 3 takes simply 
advantage of the fact that a given residue has a small degree and depends only 
on a small part of the development of A~^ . We first give a version where the 
adequate part of A is given as input. We will later detail the way to efficiently 
compute these coefficients. 

Lemma 14. The arithmetic complexity of Algorithm 3 is 2d'^ operations. 
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Input: A, B, j and V = [A-^Y^^ld+i- 
Output: Bj defined by Lemma 13. 

1: Compute U =^ [VByrli; 

2: Return B, = z-3[B - AU]]-^"^^^; 

Algorithm 3: Rcsidue(A,B 



Tlicn, we define to be tfie liigli-order residue of tlie expansion oi A ^, 
using Lemma 13 with B = 1: 

T.9^^'^^i^' (13) 



A{z) A[z) 

Tlie idea of the fast hfting is that when substituting A^^ in the right hand side 
of Equation 13 by this actual right hand side, one gets: 

This shows that Y^i depends only on and of chunks of A~^ ^ of size d, at 
and around V t-, more generally one gets the following formula: 

« r n „ 1 /9-^ 



{A-X = z^' ^,{A-X--\_, ^ (14) 



a-l 

which states, from Equation (13), that the Taylor coefficients of order greater 

r { z\ 

than £, can also be recovered from the Taylor development of ^(^ J^ . 

Then the second part of the high-order lifting is thus Algorithm 4 which gets 
a small chunk of A~^ at a double order of what it is given as input, as shown 
in Figure 3. 

Input: S = = [-4"i]2llL+i and V^..^ defined by eq. (13). 

Output: F2e+i_d and K+i = [^"^lal+^lL+r 
1: Compute Yi^ ^^^-'^[Fse-dKl'Ild; // e?- (H) 

2: Compute F2e+i_rf '^^ _ AVLlal+ll^; // Residue{A, 1, 2"+^ - d) 

3: Compute Vh z^^^^^dfr^.+i^,^]^-!; // eq. (H) 

4: Return F2.+i_rf and K+i = [A-%X\z\d+i =^ [^^IjI^IIm+i + ^h- 



Algorithm 4: Double-Order ( S*, T, F, e). 

Lemma 15. The arithmetic complexity of Algorithm 4 is id^ operations. 
Proof. Below are the complexity of the successive steps of Algorithm 4 
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Ve 

2e-l 



T 



*- Degrees 

2.+1 



Figure 3: Computing chunks at double order. 



1) One truncated polynomial multiplication of degree d — 1, complexity d^; 

2) One truncated polynomial multiplication of degree d — 1, complexity d^; 

3) One truncated polynomial multiplication of degree d — 1, complexity d^. 

4) No-op, V'^^^ and Y^"'^ do not overlap. 

□ 

Then Algorithm 5 gives the complete precomputation to get a sequence of 
doubling order F's which will enable fast computations of the chunks of the 
Taylor expansion. 



Input: A polynomial A{z) of degree d. 
Input: A valuation a and a degree f3 > d. 
Output: eo s.t. 2"°-^ < 2d < 2^° ; e^j s.t. 2"^ < (3 + d < 2'='^+!. 
Output: The Taylor development of ^ up to 5 = max{2^'' — 1; /3 — a}. 
Output: (Faco-d, • • ■ ,r2-/3_d). 
1: Compute eo s.t. 2"°-^ < 2d < 2^° ; ep s.t. 2''f < l3 + d < 2'='^+!; 

2: Let Co = fceo =^ 2'^" - d and (5 =^ max{2'=" - 1; ;9 - a}; 

3: Compute S =^ [A^^Jq, via Taylor expansion of A; 

4: Uo [A-%l-J, = [S]t-,; 

5: Compute Fjo =^ [/ - ^J7o]|+'^"^ // i?esidue(A, 1, Co) 

7: for i = eo + 1 to 6/3 do 

8: k, 2* - d; 

9: (Ffc^;y,) ='Double-Order([A-i];5-\y,_i,Ffc,_,,z- 1); 

Algorithm 5: High-Order(A, a, /3). 

Figure 4 shows which high-order terms are recovered during this giant steps 
of precomputation, for a computation using 50 precomputed terms of the Taylor 
development with Algorithm 5 and degree{A) = 6. 
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Figure 4: High-Order lifting to 2Li°g2(/3)J _ i computing r24_6, ^2^~6, Tae.g, 
^2^-61 ^2^-6 and r29_6. 

Lemma 16. The arithmetic complexity of Algorithm 5 is less than: 
31og2 (^^^) + max{4d2; - a + 2d)} 

Proof. Below are the complexity of the successive steps of Algorithm 5 

3. One Taylor expansion of an inverse of degree d up to the degree S < 
max{2d; /3 - a}, complexity J2t=i 2i - 1 + ELd+i 2d - 1 < d{2S - d) < 
max{3d^; d(/3 - a + d)}. 

5. One truncated polynomial multiplication of degree d — 1, complexity d^ . 

9. 6/3—60 < log2 (^^^^^ calls to Algorithm 4, complexity bounded by 3 logj (^^^^^ (P- 

□ 

Once the high-order terms are computed, one can get the development 
chunks of [^]a ^ shown in Algorithm 6. 



Input: A, B, (Ti) as defined in equation (13); 

Input: a valuation a, a degree /? and S — [A^^]q, with 6 > f3 ~ a. 

Output: [BA-^]^ 

1: it l3 <d then 

2: Return [BS]^; 

3: else 

4: Ba =^Residue(A, B, (r^), S", a); 

5: Return z°[B„S'](^~"; / / eq. (I4) 

Algorithm 6: DevelChunk(A, B, {r,),S, a, P). 

Algorithm 6 uses a variant of Algorithm 3, which needs to compute on the 
fly the chunks of the inverse it requires. This variant is shown in Algorithm 7. 

The point is that these chunks of the development of the inverse are recovered 
just like the chunks of any fraction, but with some high-order residues already 
computed. Algorithm 8 is thus a variant of Algorithm 6 with B = 1 and a 
special residue call. 
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In 
Oi 

1 

2 
3 
4 

5 
6 


put: A, B, (Ti), S, a as in algorithm 6. 
jtput: Ba defined by Lemma 13. 

if a = then 
Return B; 

else 

V =''lnverseChunk(A, {Ti),S, a - 2d + l,a - 1); 

ut'[vB]2z',; 

Return z'^iB - AU]^+'^-'^ . // Residue{A, B,a,V) 


Algorithm 7: Residue{A, B,{r,), S,a). 


In 
Oi 

1 

2 
3 
4 

5 


put: A, (Ti), a, (3, S — [A with 6 > P — a as in. algorithm 6. 
Jtput: [A-^]^ 

if /3 < (5 then 
Return [S]^; 

else 

r„ =''Getr(A, {Ti),S,a); // Residue{A,l,a); 
Return z^paS*]^""; / / eq. (U) 



Algorithm 8: InverseChunk(A, (Fi), S", a, /?). 



Algorithm 9 is this special residue call, a variant of Algorithm 7, which uses 
the precomputed high-order Fj to get the actual Fq, it needs, in a logarithmic 
binary-like decomposition of a. 



Input: A, (Ti), S, a as in algorithm 6. 


Output: Fq; 


1 


if a = then 


2 


Return 1. 


3 


else if a < S then 


4 




5 
6 


Return z-"[l - AC/]^+^~i. // Residue{A,l,a, S) 
else 


7 


a = Llog2(" + d)\ ; // so that 2" < a + d < 2""+^ 


8 


Return Residue(A, F^, (F,,), a + d - 2"); 


Algorithm 9: GetF(A, (F^), S", a). 



We have shown in Figure 4 the high-order precomputation of the different 
F's required for the computation e.g. of [BA'^Wl^^^ = [Sgso^'^l^*^, with A of 
degree 6. Then, Figure 5 shows the successive baby step calls of residue and 
inverse chunks needed to compute the 950th residue iJgso 

Lemma 17. The arithmetic complexity of computing the [a,/3] chunk of the 
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[A-']f,' [A'']l§ [A-^mi [A-']lll [A-^]fs 




1 


Residue 


Residue 


Residue 






r250 




ri22 





Figure 5: DevelChunk Expansion (alg. 6 from right to left) where some chunks 
(boxed) were precomputed by high-order lifting (alg. 5 and fig. 4). 

development of [BA^^{z)]^ via Algorithm 6 is less than: 



log2 



l3 + d 
2d 



Proof. Except for the calls to other algorithms, DevelChunk (alg. 6) has com- 
plexity (/3 — a)^, Getr (alg. 9) has complexity d^, InverseChunk (alg. 8) has 
complexity (/3 — a)^ and Residue (alg. 7) has complexity 2d^. 

Now the logarithmic binary decomposition of /? shows that GetF, Inver- 
seChunk and Residue are each called less than log2 (^^^^^ times. □ 

4.4 Fiduccia's algorithm for linear recurring sequences 

An alternative to the high-order lifting is to directly use Fiduccia's algorithm for 
linear recurring sequences [Fiduccia, 1985]. Its complexity is slightly different 
and we show next that it can be interesting when j3 = a. 

4.4.1 Single coefficient recovery 

With the same notations as before, one wants to solve B = Ax T ioT B and A 
polynomials of degree bounded by d and T a series development. We want to 
obtain the coefficients of T only between degrees a and (3. The algorithm is as 
follows: solve directly for the first d terms of T and afterwards, if A = X^iLo '^j^N 
we obtain a recurring linear sequence for the coefficients of T = tiZ^: 

d 

a-ote = - ^ aite^i (15) 



If Qq 7^ 0, let us define the characteristic polynomial P{Z) = rev{A)/ao = 
A{l/Z)Z'^/ao. This induces the following linear system involving the companion 
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matrix of P, C = Conipanion(P(Z)): 




1 



V 



ao 



By developing the above system £ - 
the simple dot product: 



\ 



- — J 

aa / 



■ d times, we obtain one coefficient of T with 





( tt-d \ 
















[ ) 







[0,...,0,1] 



,e-d 



[tl, ■ ■ ■ , td]"^ — [tl 



,tA X C 



■e-d 



X [0, 



,o,if 

(16) 

The idea of Fiduccia is then to use the Cayley-Hamilton theorem, P{C) = 
0, and identify polynomials with a vector representation of their coefficients, 
in order to obtain C^"'' x [0, ...,0, 1]^ = x Z'^-^ = Z^-^ mod P(Z). 

Now the modular exponentiation is obtained by binary recursive squaring in 
log2(^ — 1) steps, each one involving 1 or 2 multiplications and 1 or 2 divisions 
of degree d, depending on the bit pattern of £ — 1. Then, the complexity of 
the modular exponentiation is bounded by log2(£)(8(i^) with an average value 
of log2(£)(6d^), exactly the same constant factor as the high-order lifting when 
j3 = a. But then the additional operations are just a dot product of the obtained 
polynomial with [ti , . . . , tj\ and the initial direct Taylor recovery of the latter 
coefficients. Thus yielding the overall complexity for a single coefficient of the 
series of log2(^)(6c?^) + arithmetic operations. 



4.4.2 Cluster of coefficients recovery 

In the more generic case of several clustered coefficients, £ € [a, /3], one needs to 

modify the algorithm, in order to avoid computing {3— a products by [0, . . . , 0, 1, 0, . . . , 0]"^. 

Instead one will recover d coefficients at a time, in steps. 

First the binary recursive powering algorithm is used to get '^^^ expres- 
sions of C'"+(-'~i''' = YltZo '^t^ ^ ^-t an average global arithmetic cost of 

log2(d) -t- log2(a)) (6d2) . Then for w = [^i, . . . , t^]'^, the sequence Cv, C^v, . . . , C^-^v 
is computed once, iteratively. Finally this sequence is combined with the coeffi- 
cients cp^ to obtain the 13— a coefficients at an overall cost of (^^^^ log2(c?) + log2(Q;)^ (6d^)-|- 
4rf2 + max{rf2; d(2(/3 ~ a) ~ d}. 



4.4.3 High-Order and Fiduccia algorithm comparison 

We compare in table 2 the arithmetic complexity bound of Storjohann's high- 
order lifting and the average complexity bound for Fiduccia's algorithm. 

From this table we see that Storjohann's algorithm should be preferred when 
/3 ^ a and that Fiduccia's algorithm should be preferred when both conditions 
f3 — a and "rf is small" are satisfied. In practice, on the bivariate examples of 
section 5, with /3 — a, the differences remained within 20% and were always 
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Algorithm 


£^ I3 = a 




High-Order 
Fiduccia 




(6rf^ + 2(/?-a)^)log2(^) 
+d(/3 -a + 2d) 
{6d^) (fc^log,(d) + log2(a)) 
+d(2(/3-a) + 3d) 



Table 2: Complexities, for /3 > 2c?, of Storjohann's high-order lifting and Fiduc- 
cia's algorithm, the latter on average. 

dominated by the fraction reconstruction. Therefore, in the following, we use 
preferably Storjohann's high-order lifting. 

4.5 Bivariate lifting 

We come back to the bivariate case of Equation (7). B, A and all the gi are 
polynomials in y (not fractions). Therefore one can compute the lifting on z 
using arithmetic modulo j/""*"^ for the coefficients. Operations in this latter 
domain thus costs no more than O(n^) operations over Q. In the following we 
use the formalism of the high-order lifting, but the algorithm of Fiduccia can 
be used as a replacement if needed. 

Finally, for faster computations, one can also convert the rational coefficients 
to their floating point approximations in order to get Algorithm 10. 



Input: ^^eQ[y]{z). 

Output: A floating point 
1: Bf{y, z) and Af{y, z) are the conversion of B and A in floating points. 
2: (F,) =High-Order(A/,a,;3) modulo 2/"+^ 

=DevelChunk(yl/, B/, {Ti),a,(5) modulo ?/"+^; 
4: Return {[gj{y)]l)j=a,...,p- 

Algorithm 10: Bivariate floating point lifting. 

Then floating point arithmetic modulo y"^^, together with Lemmata 17 
and 16, yield the following complexity for the computation of chunks of the 
Taylor development of a bivariate polynomial fraction: 

Proposition 18. Let G{y, z) — be a rational fraction with B and A 

polynomials of degrees less than d with floating point coefficients. Suppose now 
that /3 >> d, and that /3 — a = 0{d). Then the overall complexity to compute 
with Algorithm 10 and classical polynomial arithmetic is 

0{\og{l3)d^n^) 

rational operations. 

This improves e.g. on the algorithm of Knuth [1997] used in Nicodcmc et al. 
[2002, Algorithm 8], which has complexity 0(log(/3)d^n^). 
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Note that, with fast floating point polynomial arithmetic (for instance using 
FFT), our complexity would reduce to 

0(log(/3)dlog(d)nlog(n)) = 0((nd)i+°(i)log(/3)). 



4.6 Overall complexity 

Another improvement can be made on Algorithm 2 when the desired degree n 
in y of the development is smaller than the degree d of the obtained bivariate 
fraction: compute the series and the fraction reconstruction also modulo y""*"^. 
Recall that we consider computing V{Ni = n) where the transition matrix is 
of dimension R, with 0{R) nonzero coefficients, and the rational fraction is of 
degree d < R. Therefore, the overall complexity of Algorithms 2-10, with fast 
arithmetic, computing the latter, is bounded by: 

O (mm{n, d}d'^R\og{R) + log{£)d^+°'^^^n^+°'^^^^ (17) 



5 Applications 

5.1 Toy-examples 

We consider here an independent and identically distributed sequence of letters 
that are uniformly distributed over the four letters alphabet A — {A,B, C,D}. 
Partial recursion is performed with a floating point-arithmetic precision of e = 
1/2^"^^ ~ 10^^^" (implementation using the mpf class from the GMP**), and 
relative error 77 = 10~^^. The bivariate polynomial fraction reconstruction has 
been implemented using LinBox'' and GiVARO^' and the high-order lifting using 
GiVARO and Mpfr'' with the mpreal'*^ C-I-+ wrapper. All running times are 
expressed in seconds and MT stands for Memory Thrashing. 

In Table 3 we consider 5 example motifs of increasing complexities. For the 
partial recursion approach, the eigenvalue A is reported along with the corre- 
sponding computational time. One should note that this part of the algorithm 
uses in fact a very inefficient approach (the power method) while more effective 
approaches are available (e.g. Lanczos iterations). But the performance remain 
acceptable overall. We also report in this table the fractional degrees of G{y, z) 
computed through the rational reconstruction. We can see that the limiting fac- 
tor of the series computation is memory. For example, for Motif AD(A|D){15}AD, 
only storing the first 2d = 3570-1-3572 = 7142 bivariate terms over the rationals 

*GNU Multi-Precision library http://gmplib.org/ 
^http : //linalg . org 

•^http : //I jk . imag . f r/CASYS/LOGICIELS/givaro 
^http : //www .mpf r . org 

*http : //www .holoborodko . com/pavel/mpf r 
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POSIX regex 


R 


F 


1 - A 




h 


frac. deg. 


t2 


AD(A|D){0}AD 


5 


1 


3.7 X 10- 


-3 


0.03 


2/4 


0.00 


AD(A D){2}AD 


12 


2 


9.5 X 10- 


-4 


0.11 


6/8 


0.01 


AD(A D){5}AD 


50 


8 


1.2 X 10- 


-4 


0.49 


28/30 


0.12 


AD(A|D){10}AD 


555 


89 


3.7 X 10- 


-5 


6.14 


321/323 


3.18 


AD(AD){15}AD 


6155 


987 


1.2 X 10- 


-7 


73.46 


3570/3572 


17035.18 



Table 3: Toy-example motifs over the alphabet A = {A,B, C,D}. R (resp. F) is 
the number of states (resp. final states) of the minimal order DFA associated 
to the regular expression. A is the largest eigenvalue of P, and ti the time to 
compute A using the power method, "frac. deg." corresponds to the fractional 
degrees of G{y, z) and t2 is the time to compute G(y, z) using Algorithm 2. 

of the series would require the order of Ad? R\og2s{R) ~ 1.7 x 10^ Gigabytes, 
using the estimates of Proposition 12. Note that for this motif, the degrees in 
z of the numerator and denominator of the fraction are only probabilistic since 
they were computed modulo a random word-size prime number at a random 
evaluation in y. 

In Table 4 we perform the computation of V{Nt = n) for the considered 
motifs for various ranges of values for i and n. For validation purposes, the 
results of the partial recursion are compared to those of the slower full recur- 
sion. The relative error between the two approaches is compared to expected 
relative error rj: in all cases but one the resulting error ratio (e.r.) is below 
1.0 thus proving that both results are quite consistent. In the remaining case, 
e.r. is only slightly larger than 1.0 (1.495) which remains acceptable. In terms 
of computational time however, the partial recursion approach is dramatically 
faster that the full recursion. This is especially sensitive for the more complex 
motifs for which full recursion was not even performed in some cases. 

With the high-order lifting approach (Algorithms 2-10) we see that whenever 
the degree of the bivariate fraction remains small, the overall performance is 
very good. Moreover, one could compute the fraction once and then use the 
very fast high-order lifting to recover any coefficient at a negligible cost. Now 
when the degrees and the size of the involved matrices grows, memory becomes 
the limiting factor, just to store the series, prior to any computation on it. 

In terms of empirical complexity, the full recursion increases at the expected 
0{nx£) rate. On the other hand, the partial recursion running time is consistent 
with an 0{a x n) rate with a increasing at a roughly linear rate with n. 

5.2 Transcription factors in Human Chromosome 5 

In this section, we consider the complete DNA {A = {A, C, G, T}) sequence'' of 
the Human Chromosome 5. In order to take into account the codon (DNA 
words of size 3) structure of the sequence (which is known to play a key role 

'■'http : / /www .pseudogenes . org/data/human/build36/geiiome/chr5 .fa 
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n a 




? P(iV^ = n) 




e.r. 


to 




+ti 


t4 


+t2 


Q 


10 90 


2 


000 9.12559 X 10" 


"2 


0.234 


0.50 


0.04 


0.07 


0.01 


0.01 


■< 
O 




20 


000 4.37982 x 10" 




0.168 


5.00 


0.04 


0.07 


0.01 


0.01 






200, 


000 3.82435 x 10" 




0.063 


49.92 


0.04 


0.07 


0.01 


0.01 


o 


100 666 


2 


000 9.06698 x 10" 


-59 


0.025 


4.47 


2.53 


2.56 


0.01 


0.01 


o 




20 


000 2.95125 X 10" 


.3 


0.586 


46.04 


2.53 


2.56 


0.01 


0.01 






200 


000 1.07460 X 10" 


-196 


1.495 


461.61 


2.53 


2.56 


0.01 


0.01 




10 128 


2, 


000 6.06131 X 10" 




0.025 


1.12 


0.13 


0.24 


0.01 


0.02 




20 


000 8.13580 X 10" 




0.114 


11.38 


0.13 


0.24 


0.01 


0.02 






200 


000 2.54950 x 10" 




0.158 


113.13 


0.13 


0.24 


0.01 


0.02 


o 

•a! 


100 971 


2 


000 4.58582 x 10" 


94 


0.027 


10.44 


8.97 


9.08 


0.01 


0.02 


« 




20 


000 1.14066 X 10" 


-34 


0.260 


107.05 


8.97 


9.08 


0.01 


0.03 






200, 


000 5.92396 x 10" 


-14 


0.232 


1075.90 


8.97 


9.08 


0.01 


0.03 




2 158 


2 


000 2.59931 X 10" 




0.031 


1.23 


0.07 


0.56 


0.00 


0.13 


in 




20 


000 2.55206 x 10" 




0.040 


12.80 


0.07 


0.56 


0.01 


0.13 


3 




200 


000 1.35276 X 10" 




0.041 


124.76 


0.07 


0.56 


0.01 


0.13 


■< 


20 278 


2, 


000 1.59351 X 10" 


-22 


0.055 


8.76 


2.18 


2.67 


0.02 


0.64 






20 


000 3.79239 x 10" 


-11 


0.126 


88.19 


2.18 


2.67 


0.03 


0.65 




200 


000 5.79753 x 10" 


2 


0.044 


912.11 


2.18 


2.67 


0.04 


0.66 


o 

■a! 


2 75 


2 


000 2.38948 x 10" 


4 


0.017 


14.38 


1.05 


7.19 


0.13 


27.90 


O 




20, 


000 4.4012 X 10"- 




0.093 


148.49 


1.05 


7.19 


0.32 


28.07 


■rH 




200, 


000 1.33166 X 10" 


-1 


NA 


NA 


1.05 


7.19 


0.48 


28.12 


a 


20 380 


2 


000 1.24717 X 10" 


-27 


0.000 


100.45 


34.41 


40.55 


0.80 


261.21 


<e 




20 


000 1.25298 X 10" 


-25 


NA 


NA 


34.41 


40.55 


1.84 


263.35 


a 




200 


000 6.25326 x 10" 


-18 


NA 


NA 


34.41 


40.55 


2.72 


264.05 


9 


2 87 


2 


000 6.74582 x 10" 


-6 


0.001 


153.54 


12.95 


86.41 


0.16 


17035.34 




20 


000 7.02066 X 10" 


5 


NA 


NA 


12.95 


86.41 






■rH 




200 


000 9.09232 x 10" 


4 


NA 


NA 


12.95 


86.41 






Q 


20 491 


2 


000 5.72720 x 10" 


30 


NA 


NA 


477.05 


550.51 






•< 




20, 


000 6.39056 x 10" 


-29 


NA 


NA 


477.05 


550.51 






9 




200, 


000 1.42666 X 10" 


-27 


NA 


NA 


477.05 550.51 







Table 4: F{N£ = n) for the toy-example motifs over the alphabet A = {A, B, C, D} 
using a i.i.d. and uniformly distributed background model, a is the rank of the 
partial recursion (depends only on n), "e.r." is the ratio of the relative error 
of the computation divided by the targeted relative error r] = 10"-^^, to is the 
running time to perform the computation using the full recursion, f;; is the 
running time to perform the computation using the partial recursion 
gives the total running time ti + ts), t4 is the running time to perform the 
computation using the high-order lifting ("-|-f2" give the total running time 

t2+t4). 
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in coding sequences), we adjust a homogeneous order 2 Markov model on the 
data^°. Sequence length is £ — 131,624,728, and the sequence starts with 
the two letters GA. The maximum likelihood estimate (MLE) of the transition 
matrix of the model is directly obtained from the observed counts of all DNA 
words of size 3. For example, since 7V(TAA) = 2632852, 7V(TAC) = 1451956, 
iV(TAG) = 1655432 and iV(TAT) = 2565811, we get the MLE: 



P(X, . C|X._,X._, . TA) ''''''' 



2632852 + 1451956 + 1655432 + 2565811 
One should note that our Markov parameters are then all rationals. 



Transcription Factor 


R 


F 


1 - A 




tl 


frac. dcg. 


t2 


CGCACCC 


21 


1 


6.10426 X 10- 


5 


0.13 


18/19 


3.24 


TCCGTGGA 


22 


1 


1.52604 X 10" 


5 


0.13 


19/20 


3.62 


ACAACAAC 


23 


1 


1.50220 X 10" 


5 


0.15 


21/22 


5.71 


(A|C)TAAA(C|T)AA 


25 


2 


6.08161 X 10" 


5 


0.18 


20/20 


4.36 


{A|T){3}TTTGCTC(A|G) 


30 


2 


3.81483 X 10" 


6 


0.20 


23/23 


5.50 


A{24} 


38 


1 


2.66454 X 10" 


15 


0.25 


36/37 


3.78 


TA(A|T){4}TAG(A|C) 


54 


2 


3.05260 X 10" 


5 


0.45 


21/22 


1.41 


(C|T)CCN(C|T)TN(A|G){2}CCGN 


66 


4 


1.52614 X 10" 


5 


0.63 


24/25 


9.57 


GCGCN{6}GCGC 


228 


8 


1.51359 X 10- 


5 


3.84 


54/55 


66.52 


CGGN{8}CGG 


419 


13 


2.40786 X 10- 


4 


10.12 


81/82 


283.61 


TTGACAN{17}TATAAT 


2068 


34 


5.96047 X 10- 


8 


34.91 


173/173 


6392.23 


TTGACAN{16, 18}ATATAAT 


2904 


55 


4.47035 X 10- 


8 


49.18 


253/253 


23727.28 


GCGCN{15}GCGC 


6158 


225 


1.51359 X 10- 


5 


202.48 


1079/1080 


MT 



Table 5: Regular expression of Transcription Factors (TFs) defined over the 
DNA alphabet A = {A,C,G,T} using the lUPAC notation N = (A|C|G|T). R 
(resp. F) is the number of states (resp. final states) of the minimal order 2 
DFA associated to the TFs. A is the largest eigenvalue of P, and ti the time to 
compute A using the power method, "frac. deg." corresponds to the fractional 
degrees of G{y, z), and t2 is the time to compute G{y, z) using Algorithm 2. 

In Table 5 we consider a selection of various Transcription Factors (TFs) 
motifs. These TFs are typically involved in the regulation of gene expression. 
The selected motifs range from simple patterns (e.g. CGCACCC) to highly complex 
ones (e.g. GCGCN{15}GCGC). For each motif, the precomputations necessary for 
the partial recursion (computation of A) and the high-order lifting approach 
(computation of G{y,z)) are indicated. As in Table 3 we see that the running 
time increases with the motif complexity, eventually resulting in a memory 
thrashing (MT) for the computation of the rational reconstruction. One should 
note that time t2 is larger for these motifs than for the toy examples of the 
previous section even when the fractional degrees are similar. This is explained 

^''Homogeneous order m ^ Markov model MLE use the observed frequencies of (m + 1)- 
words. Taking into account the codons (3-words) frequency hence requires to consider at least 
an order m = 2 Markov model. 
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by the more complex nature of the model parameters (e.g. 1451956/8306051 
for the TFs vs 1/4 for the toy-example). 

In Table 6, we can see the computed values of P(A^^ ~ n) for our TFs motifs 
and for various values of n. Due to the large value of i, the full recursion was 
no longer tractable and there is hence no reference value for the probability of 
interest. However, the results of both approaches are always the same (up to 
the requested relative precision). For low complexity TFs, the high-lifting is 
always much faster than the partial recursion when considering only the core 
computations. However, we get the opposite results when we consider as well 
the pre-computation time (i.e.: obtaining G{y,z) for the high-order lifting, or 
computing A for the partial recursion). As for the toy-examples, we see that the 
high-order lifting approach cannot cope with high complexity patterns since the 
fractional reconstruction is not feasible for them. 



5.3 Protein signatures 

We consider here the complete human proteome build as the concatenation of 
all human protein sequences over the 20 aminoacid alphabet (from the Uniprot 
database^^) resulting in a unique sequence of length £ — 9, 884, 385. We fit an or- 
der Markov model onto these data from the observed counts of all aminoacids: 

N{A) = 691113, N{R) = 555875, N{N) = 357955, N{D) = 472303, N{C) = 227722, 

N{E) = 698841, N{Q) = 469260, N{G) = 649800, N{E) = 258779, N{l) = 432849, 

Af(L) = 981769, N{K) = 567289, N{K) = 212203, N{F) = 363883, Af(P) = 617242, 

N{S) = 816977, N{T) = 529157, Af(W) = 119979, Af{Y) = 267663, Af(V) = 593726. 

As a consequence, our MLE parameters are expressed as rationals. For example: 
F{X, = W) = 119979/9884385. 

We also consider a selection of 10 PROSITE^" signatures which correspond 
to known functional motifs in proteins. In Table 7, the complexity of the con- 
sidered motifs are studied along with the computational time to obtain A (time 
^i) or to obtain G{y,z) (time ^2)- Motifs are sorted by increasing complexi- 
ties, from Signature PILI_CHAPERONE (whose minimal DFA has i? = 46 states 
including F = 1 final state) to Signature SUGAR_TRANSP0RT_2 (whose minimal 
DFA has R — 1152 states including F = 40 final states). For both approaches, 
the running time for these precomputations is similar but, as for previous ap- 
plications, we observe a steeper increase for the fractional reconstruction when 
considering high complexity motifs. 

In Table 8 we compute V{Nt = n) for all considered PROSITE signatures 
and a range of values for n. For each combination, both the partial recursion 
and the high-order lifting are performed and the two methods agree perfectly 
in their results. As for the TFs, the fast Taylor expansion (time tn) is much 
faster that the recursion part (time t^) but the precomputation of G(j/, z) (time 
^4) has a high cost, especially for the signatures of high complexity which is 
consistent with previous observations. 

^ http : / /www . uniprot . org 
^^http : / /expasy . org/prosite/ 
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Transcription Factor 


n 


a 


f{Ni = n 


) 


t3 


+ti 


t4 


+t2 


CGCACCC 


10 


117 


3.64365 X 10- 


-dVI 


0.19 


0.32 


0.41 


3.65 




20 


204 


1.27159 X 10- 


-551 


0.60 


0.73 


1.05 


4.32 




40 


373 


2.07574 X 10- 


-518 


2.10 


2.23 


3.17 


6.44 


TCCGTGGA 


10 


131 


1.33747 X 10- 


-268 


0.20 


0.33 


0.01 


3.63 




20 


225 


3.46367 X 10- 


-252 


0.63 


0.76 


0.03 


3.65 




40 


409 


3.11336 X 10- 


-225 


2.17 


2.30 


0.05 


3.70 


AACAACAAC 


10 


142 


3.86490 X 10- 


-170 


0.25 


0.40 


0.02 


5.73 




20 


258 


1.22856 X 10" 


-155 


0.88 


1.03 


0.03 


5.76 




40 


492 


1.69964 X 10" 


-132 


3.24 


3.39 


0.06 


5.79 


(A|C)TAAA(C|T)AA 


10 


136 


6.76399 X 10" 


-8UtiV 


0.26 


0.44 


0.53 


4.89 




20 


240 


4.79070 X 10" 


-8036 


0.87 


1.05 


1.35 


5.71 




40 


449 


3.22178 X 10- 


-7980 


3.14 


3.32 


4.07 


8.44 


(A|T){3}TTTGCTC(A|G) 


10 


150 


6.03263 X 10" 


-579 


0.30 


0.50 


0.63 


6.13 




20 


267 


2.40165 X 10" 


-559 


0.99 


1.19 


1.59 


7.12 




40 


500 


5.10153 X 10- 


-526 


3.58 


3.78 


4.87 


10.42 


A{24} 


5 


171 


1.16314 X 10- 


-4 


0.31 


0.56 


0.02 


8.73 




10 


310 


1.09217 X 10- 


-6 


1.01 


1.26 


0.02 


17.75 




20 


589 


9.62071 X 10- 


-11 


3.68 


3.93 


0.04 


40.27 


TA(A|T){4}TAG(A|C) 


5 


93 


1.60427 X 10- 


-3914 


0.21 


0.66 


0.12 


3.20 




10 


148 


3.23597 X 10- 


-3899 


0.54 


0.99 


0.27 


5.49 




20 


256 


1.79579 X 10- 


-3871 


1.69 


2.14 


0.76 


7.40 


(C|T)CCN(C|T)TN(A|G){2}CCGN 


5 


150 


1.94195 X 10- 


-173 


0.43 


1.06 


0.02 


9.59 




10 


215 


8.71218 X 10" 


-165 


1.00 


1.63 


0.02 


8.60 




20 


342 


2.39167 X 10- 


-150 


3.01 


3.64 


0.05 


8.63 


GCGCN{6}GCGC 


1 


65 


4.73516 X 10" 


-19 


0.24 


4.08 


0.62 


67.09 




2 


92 


1.08880 X 10" 


-17 


0.50 


4.34 


0.87 


98.62 




4 


138 


1.91912 X 10" 


-15 


1.22 


5.06 


1.49 


155.99 


CGGN{8}CGG 


1 


82 


5.21188 X 10- 


-46V 


0.59 


10.71 


1.47 


284.93 




2 


114 


2.80818 X 10- 


-464 


1.17 


11.29 


1.79 


403.75 




4 


169 


2.71751 X 10- 


-459 


2.74 


12.86 


3.12 


651.20 


TTGACAN{17}TATAAT 


1 


92 


6.97988 X 10- 


-7 


3.06 


37.97 


2.49 


6394.73 




2 


137 


5.93598 X 10- 


-6 


6.72 


41.63 


3.78 


9378.03 




4 


199 


1.43106 X 10- 


-4 


15.23 


50.14 


6.63 


15008.84 


TTGACAN{16, 18}ATATAAT 


1 


96 


2.28201 X 10- 


-6 


4.86 


54.04 


5.30 


23732.58 




2 


129 


1.79676 X 10- 


-5 


9.38 


58.56 


7.42 


34949.45 




4 


202 


3.71288 X 10- 


-4 


23.16 


72.34 


15.65 


56832.04 


GCGCN{15}GCGC 


1 


119 


4.71467 X 10- 


-19 


12.62 


215.10 




MT 




2 


173 


1.08420 X 10 


-17 


27.15 


229.63 




MT 




4 


255 


1.91136 X 10- 


-15 


63.45 


265.93 




MT 



Table 6: Y{Ne, = n), with i = 131 624 728, for the TFs motifs over the alphabet 
A = {A, C,G,T} using an order 2 homogeneous Markov model, a is the rank 
of the partial recursion (depends only on n) , is the running time to perform 
the computation using the partial recursion gives the total running time 

ti+ts), is the running time to perform the computation using the high-order 
lifting ("+^2" give the total running time t2 + ti). 
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PROSITE signature 


AC 


R 


F 


1 - 


- A 




tl 


degrees 


t2 


PILI_CHAPERONE 


PS00635 


46 


1 


1.7 X 


10- 


iO 


0.63 


15/18 


0.27 


EFACTOR.GTP 


PS00301 


52 


4 


1.0 X 


10" 


8 


0.74 


14/16 


0.18 


ALDEHYDE_DEHYDR_CYS 


PS00070 


67 


17 


1.1 X 


10- 


6 


0.91 


11/12 


0.21 


SIGMA54_INTERACT_2 


PS00676 


85 


1 


8.8 X 


10" 


10 


1.08 


0/16 


0.22 


ADH_ZINC 


PS00059 


87 


8 


2.2 X 


10" 


7 


1.40 


37/41 


2.07 


SUGAR_TRANSP0RT_1 


PS00216 


188 


54 


6.7 X 


10' 


7 


3.48 


17/18 


1.05 


THI0LASE_1 


PS00098 


254 


6 


2.6 X 


10' 


15 


5.21 


37/38 


1.76 


FGGY_KINASES_2 


PS00445 


463 


6 


2.2 X 


10' 


7 


11.52 


26/30 


2.39 


PTS_EIIA_TYPE_2_HIS 


PS00372 


756 


46 


1.3 X 


10" 


9 


17.47 


77/80 


18.60 


SUGAR_TRANSP0RT_2 


PS00217 


1152 


40 


8.6 X 


10" 


7 


36.95 


149/151 


68.36 



Table 7: Characteristics of some PROSITE signatures defined over the 
aminoacid alphabet. AC is the accession number in the PROSITE database, R 
(resp. F) is the number of states (resp. final states) of the minimal order 2 DFA 
associated to the signatures. A is the largest eigenvalue of P, and ti the time to 
compute A using the power method, "frac. deg." corresponds to the fractional 
degrees of G{y, z), and t2 is the time to compute G{y^ z) using Algorithm 2. 

6 Conclusion 

We have developed two efficient approaches to obtain the exact distribution of a 
pattern in a long sequence. Table 9 recalls the different obtained complexities. 

The first approach uses a partial recursion and is suitable even for high 
complexity patterns. Unfortunately, its quadratic complexity with the observed 
number of occurrence n makes it not recommended for large values of n. 

The second approach has two steps: first obtaining G{y, z) through fraction 
reconstruction of the series ^ Gg{y)z^ and then performing fast Taylor expan- 
sion using high-order lifting. On the one hand, in all examples, just computing 
the series appears to be the bottleneck, especially for high complexity patterns. 
On the other hand, the fast Taylor expansion is usually very fast, even if the 
running time increases with the fractional degrees. Moreover, once the gener- 
ating function has been obtained, the fast liftings can reveal the distribution of 
any pattern count at a very low cost. 

Further work include improvement of the precomputation of A, using e.g. 
Lanczos-like approaches; the precomputation of G(y, z) could also be improved, 
for instance reconstructing the rational fraction from approximated evaluations 
of the series. However, an exact reconstruction on approximate values could 
yield a reasonable model, but only with a generic behavior. That is d, the 
obtained degree, would then in general be equal to i?, the size of the input 
matrices. On the contrary, in the examples, this degree is much lower in practice. 
One should also note that the distribution of motif occurrence is known to be 
very sensitive to the parameters (transition of the Markov chain) and that any 
approximation performed on these parameter might have large and uncontrolled 
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FKOblili signature 


n 


a 


P(A''^ = n) 






+ti 


t4 


+t2 


PILI_CHAPERONE 


5 


125 


1.20635 X 10" 


16 


0.98 


1.16 


0.01 


0.17 








0. ^o4yi X iU 


35 


3.1 / 


Q on 
3.80 


O.Ul 


n oo 
U.2o 




20 


413 


1.81452 X 10 


74 


11.20 


11.83 


0.01 


0.69 


EFACTOR-GTP 


5 


114 


7.25588 X 10" 


8 


1.07 


0.84 


0.01 


0.18 








z.oU/UO X iU 


17 


0.41 


U.9i3 


U.Ul 


U.o2 




20 


364 


3.18090 X 10 


39 


12.23 


1.28 


0.01 


0.75 


ALDEHYDE_DEHYDR_CYS 


5 


88 


1.85592 X 10 


2 


1.13 


2.04 


0.01 


0.20 




1 n 
iU 


ioi 


i.iOiol X lU 


1 


o.oa 


4.iDU 


U.Ui 


U..)U 




20 


270 


6.05053 X 10 


3 


12.27 


13.18 


0.01 


0.60 


SIGMA54_INTERACT_2 


5 


106 


4.09432 X 10" 


13 


1.45 


2.53 


0.01 


0.23 








D. < uy ( i X iU 


28 


4.00 


o.lo 


U.Ui 


n on 

u.2y 




20 


350 


2.45724 X 10 


60 


16.17 


17.25 


0.01 


0.40 


ADH_ZINC 


5 


116 


4.51132 X 10 


•2 


1.91 


3.31 


0.01 


2.08 




1 n 

J-U 


1 Qfi 




5 


O.oO 


7 ^8 
1 .oo 




O.OKJ 




20 


352 


2.29397 X 10" 


13 


20.52 


21.92 


0.04 


7.57 


SUGAR_TRANSP0RT_1 


1 


60 


6.06925 X 10" 


3 


0.65 


4.13 


0.05 


1.10 




2 


75 


1.64759 X 10" 


2 


1.29 


4.77 


0.08 


1.51 




4 


110 


5.41084 X 10" 


2 


3.30 


6.78 


0.12 


2.29 


THI0LASE_1 


1 


85 


2.54343 X 10" 


8 


1.23 


6.44 


0.01 


1.77 




2 


127 


1.61364 X 10" 


15 


2.85 


8.06 


0.01 


2.04 




4 


207 


6.25151 X 10" 


30 


7.96 


13.17 


0.01 


2.38 


FGGY .KINASES _2 


1 


73 


2.43018 X 10" 


1 


2.08 


13.60 


0.01 


2.40 




2 


97 


2.68005 X 10" 


1 


4.21 


15.73 


0.01 


3.32 




4 


142 


1.08650 X 10" 


1 


10.27 


19.48 


0.01 


4.54 


PTS_EIIA_TYPE_2JIIS 


1 


76 


1.23843 X 10" 


2 


3.41 


20.88 


0.41 


19.02 




2 


105 


7.76535 X 10" 


5 


7.32 


24.79 


0.61 


27.96 




4 


163 


1.0177 X 10"^ 




19.18 


36.65 


1.08 


44.72 


SUGAR_TRANSP0RT_2 


1 


96 


1.71124 X 10" 


3 


6.78 


43.73 


1.37 


69.73 




2 


124 


7.28305 X 10" 


3 


13.51 


50.46 


2.01 


103.56 




4 


177 


4.39742 X 10" 


2 


32.81 


69.76 


3.57 


172.62 



Table 8: F{Ne = n), with £ = 9 884 385, for the PROSITE signatures over the 
aminoacid alphabet an order homogeneous Markov model, a is the rank of 
the partial recursion (depends only on n), fa is the running time to perform 
the computation using the partial recursion gives the total running time 

h+ta), ti is the running time to perform the computation using the high-order 
lifting ("+t2" give the total running time t2 + ti). 



30 





Approach 


Memory 


Time 


s 


Full recursion 


0{nR) 


0{n\A\Ri) 


X 



Ribcca and Raincri [2008] 


O {nR^ \og{£)) 


O {n'^R^ log(£)) 


p. 
< 


Partial recursion 


0(nR) 


0(n*|^|iJlog{^)/log{y-i)) 


+^ 

u 
tS 


Nicodeme et al. [2002] 


0{r?R'^ \og{R)) 


O log(_R) + 'n? log(^)) 


X 

W 


High-order 


0(nd2R log(R)) 


O {nd?R\og{R) + (nd)i+°(i' log(£)) 



Table 9: Complexities of the different approaches to compute P(iV£ = n). R is 
the size of the automaton, \A\ is the size of the alphabet, d < i? is the degree 
of the rational fraction and < < 1 is the magnitude of the second largest 
eigenvalue of P. 



consequences [Nucl, 2006c]. 

Another solution could be to use regularization methods or approximate 
gcd-likc approaches of e.g. Kaltofcn and Yang [2007] for the pre-computation 
of G(?/, z). This could yield significant improvements both in terms of memory 
and computational time if the small degrees were preserved. 

Overall, the high-order lifting approach is very efficient for low or median 
complexity motifs, but cannot deal efficiently with the highly complex motifs. In 
our examples, we dealt with two real applications (TFs in Human Chromosome 
5, and PROSITE signatures in the Human complete proteome) which demon- 
strate the practical interest of our approaches. Finally, dealing with fully exact 
computations for frequent (large n) and high complexity (large R) motifs yet 
remains an open problem. At the present time, for such challenging problems, it 
is likely that one can only rely on approximations like the Edgeworth expansions 
or the Bahadur-Rao approximations [see Nuel, 2011, for more details]. 

References 

A.V. Aho and JVI.J. Corasick. Efficient string matching: an aid to bibliographic 
search. Communications of the ACM, 18(6):333-340, 1975. 

C. Allauzen and M. Mohri. A unified construction of the glushkov, follow, and 
antimirov automata. Mathematical Foundations of Computer Science 2006, 
pages 110-121, 2006. 

D. L. Antzoulakos. Waiting times for patterns in a sequence of multistate trials. 
J. Appl. Prob., 38:508-518, 2001. 

E. Beaudoing, S. Freier, J.R. Wyatt, J.-M. Claverie, and D. Gautheret. Patterns 
of variant polyadenylation signal usage in human genes. Cenome Res., 10(7): 
1001-1010, 2000. 

V. Boeva, J. Clement, M. Regnier, and M. Vandenbogaert. Assessing the signif- 
icance of sets of words. In Combinatorial Pattern Matching 05, Lecture Notes 
in Computer Science, vol. 3537, pages 358-370. Springer- Verlag, 2005. 



31 



V. Bocva, J. Clement, M. Rcgnier, M.A. Roytberg, and V.J. Makecv. Exact 
p- value calculation for heterotypic clusters of regulatory motifs and its appli- 
cation in computational annotation of cis-regulatory modules. Algorithms for 
Molecular Biology, 2:13, 2007. 

A. Brazma, I. Jonassen, J. Vilo, and E Ukkonen. Predicting gene regulatory 
elements in silico on a genomic scale. Genome Res., 8(11):1202-1215, 1998. 

Y.-M. Chang. Distribution of waiting time until the rth occurrence of a com- 
pound pattern. Statistics and Probability Letters, 75(l):29-38, 2005. 

T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms, 
chapter 34, pages 853-885. MIT Press, 1990. 

Cowan. Expected frequencies of DNA patterns using Whittle's formula. J. Appl. 
Prob., 28:886-892, 1991. 

M. Crochemore and C. Hancart. Handbook of Formal Languages, Volume 2, 
Linear Modeling: Background and Application, chapter Automata for Match- 
ing Patterns, pages 399-462. Springer- Verlag, Berlin, 1997. 

M. Crochemore and V. Stefanov. Waiting time and complexity for matching 
patterns with automata. Info. Proc. Letters, 87(3):119-125, 2003. 

A. Denise, M. Regnier, and M. Vandenbogaert. Assessing the statistical signifi- 
cance of overrepresented oligonucleotides. Lecture Notes in Computer Science, 
2149:85-97, 2001. 

M. El Karoui, V. Biaudet, S. Schbath, and A. Gruss. Characteristics of chi 
distribution on different bacterial genomes. Res. Microbiol., 150:579-587, 
1999. 

T. Erhardsson. Compound Poisson approximation for counts of rare patterns 
in Markov chains and extreme sojourns in birth-death chains. Ann. Appl. 

Probab., 10(2):573-591, 2000. 

Charles M. Fiduccia. An efficient formula for linear recurrences. SIAM Journal 
on Computing, 14(1):106-112, February 1985. 

M. C. Frith, J. L. Spouge, U. Hansen, and Z. Weng. Statistical significance 
of clusters of motifs represented by position specific scoring matrices in nu- 
cleotide sequences. Nucl. Acids. Res., 30(14):3214-3224, 2002. 

J. C. Fu. Distribution theory of runs and patterns associated with a sequence 
of multi-state trials. Statistics Sinica, 6(4):957-974, 1996. 

M. X. Geske, A. P. Godbole, A. A. Schaffner, A. M Skrolnick, and G. L. Wall- 
strom. Compound Poisson approximations for word patterns under markovian 
hypotheses. J. Appl. Probab., 32:877-892, 1995. 



32 



A. P. Godbole. Poissons approximations for runs and patterns of rare events. 
Adv. Appl. Proh., 23, 1991. 

S. Hampson, D. Kibler, and P. Baldi. Distribution patterns of over-represented 
k-mers in non-coding yeast DNA. Bioinformatics, 18(4):513-528, 2002. 

J. Hopcroft. An n log n algorithm for minimizing states in a finite automaton. 
Reproduction, pages 189-196, 1971. 

J. E. Hopcroft, R. Motwani, and J. D. Ullman. Introduction to the automata 
theory, languages, and computation, 2d ed. ACM Press, New York, 2001. 

Erich Kaltofen and Zhengfeng Yang. On exact and approximate interpolation of 
sparse rational functions. In Christopher W. Brown, editor. Proceedings of the 
2007 ACM International Symposium on Symbolic and Algebraic Computation, 
Waterloo, Canada, pages 203-210. ACM Press, New York, July 29 - August 
1 2007. 

S. Karlin, C. Burge, and A.M. Campbell. Statistical analyses of counts and 
distributions of restriction sites in DNA sequences. Nucl. Acids. Res., 20(6): 
1363-1370, 1992. 

J. Kleffe and M. Borodovski. First and second moments of counts of words 
in random texts generated by Markov chains. Bioinformatics, 8(5):433-441, 
1997. 

Donald E. Knuth. Seminumerical Algorithms, volume 2 of The Art of Computer 
Programming. Addison- Wesley, Reading, MA, USA, 2"'* edition, 1997. ISBN 
0-201-89684-2. 

V. Le Maout. Regular expressions at their best: a case for rational design. 
Implementation and Application of Automata, pages 310-320, 2011. 

Gavin C. Kanga Leonardo Marifio- Ramirez, John L. Spouge and David Lands- 
man. Statistical analysis of over-represented words in human promoter se- 
quences. Nuc. Acids Res., 32(3):949-958, 2004. 

M. E. Lladser. Mininal Markov chain embeddings of pattern problems. In 
Information Theory and Applications Workshop, pages 251-255. IEEE, 2007. 
http : //ita . ucsd . edu/workshop/ 07/ f iles/paper/paper_505 . pdf . 

M. Lothaire, editor. Applied Combinatorics on Words. Cambridge University 
Press, Cambridge, 2005. 

Pierre Nicodeme, Bruno Salvy, and Philippe Flajolet. Motif statistics. Theoret- 
ical Computer Science, 287(2):593-617, September 2002. ISSN 0304-3975. 

G. Nuel. Ld-spatt: Large deviations statistics for patterns on Markov chains. 
J. Comp. Biol., 11 (6): 1023-1033, 2004. 



33 



G. Nuel. Effective p-value computations using Finite Markov Chain Imbedding 
(FMCI): application to local score and to pattern statistics. Algorithms for 
Molecular Biology, 1(1) :5, 2006a. 

G. Nuel. Numerical solutions for patterns statistics on Markov chains. Stat. 
App. in Genet, and Mol. Biol, 5(1):26, 2006b. 

G. Nuel. Pattern statistics on markov chains and sensitivity to parameter esti- 
mation. Algorithms for Molecular Biology, 1(1):17, 2006c. 

G. Nuel. Pattern Markov chains: optimal Markov chain embedding through 
deterministic finite automata. J. of Applied Prob., 45(l):226-243, 2008. 

G. Nuel. Bioinformatics: Trends and Methodologies, chap- 

ter Significance Score of Motifs in Biological Sequences. In- 
tech, 2011. ISBN: 978-953-307-282-1. Available at http://www. 
intechopen. com/books/bioinf ormatics-trends-cind-methodologies/ 
significcince- score- of -motifs- in-biological- sequences. 

G. Nuel, L. Regad, J. Martin, and A.-C. Camproux. Exact distribution of a pat- 
tern in a set of random sequences generated by a markov source: applications 
to biological data. Algorithms for Molecular Biology, 5, 2010. 

P.A. Pevzner, M.Y. Borodovski, and A. A. Mironov. Linguistic of nucleotide 
sequences: The significance of deviation from mean statistical characteristics 
and prediction of frequencies of occurrence of words. J. Biomol. Struct. Dyn., 
6:1013-1026, 1989. 

B. Prum, F. Rodolphe, and E. de Turckheim. Finding words with unexpected 
frequencies in DNA sequences. J. R. Statist. Soc. B, 11:190-192, 1995. 

M. Reignier. A unified approach to word occurrences probabilities. Discrete 
Applied Mathematics, 104(1) :259-280, 2000. 

G. Reinert and S. Schbath. Compound Poisson and Poisson process approxima- 
tions for occurrences of multiple words in markov chains. J. of Comp. Biol., 
5:223-254, 1999. 

P. Ribeca and E. Raineri. Faster exact Markovian probability functions for 
motif occurrences: a DFA-only approach. Bioinformatics, 24(24) :2839-2848, 
2008. 

Bruno Salvy. Solutions rationnelles de systemes lineaires a coefficients polyno- 
miaux. In Algorithmes en Calcul Formel et en Automatique, chapter 7. 2009. 
http://algo. inria.fr/chyzak/mpri/poly-20091201 .pdf. 

V. Stefanov and A. G. Pakes. Explicit distributional results in pattern formation. 
Ann. Appl. Probab., 7:666-678, 1997. 



34 



V. T. Stcfanov and W. Szpankowski. Waiting Time Distributions for Pattern 
Occurrence in a Constrained Sequence. Discrete Mathematics and Theoretical 
Computer Science, 9(l):305-320, 2007. 

Arne Storjohann. High-order lifting and integrality certification. Journal of 
Symbolic Computation, 36(3-4) :613-648, 2003. 

J. van Helden, B. Andre, and J. Collado-Vides. Extracting regulatory sites 
from the upstream region of yeast genes by computational analysis of oligonu- 
cleotide frequencies. J. Mol. Biol, 28f (5j:827-842, f998. 



35 



